Skip to content

feat: add structured filters to search_people#320

Open
arwinsoetanto wants to merge 764 commits intostickerdaniel:mainfrom
arwinsoetanto:feat/search-people-structured-filters
Open

feat: add structured filters to search_people#320
arwinsoetanto wants to merge 764 commits intostickerdaniel:mainfrom
arwinsoetanto:feat/search-people-structured-filters

Conversation

@arwinsoetanto
Copy link
Copy Markdown

Summary

  • Add 6 new optional filter parameters to search_people: current_company, past_company, school, title, network, industry
  • Add _build_people_search_url() static method following the existing _build_job_search_url() pattern
  • Add _format_bracket_list() helper for LinkedIn's bracket-list URL syntax (e.g. ["103334640","162479"])
  • Add _NETWORK_MAP for human-readable network degree values

Motivation

search_jobs has rich structured filters but search_people only has keywords + location. This makes people search unreliable for the primary MCP use case: LLM-driven LinkedIn research.

Real-world case study: I was using Claude Code with this MCP server to identify the hiring manager for a Product Manager role at a 74-person startup. Keyword searches like "Company Name" VP director head of product returned mostly people at other companies (fuzzy matching), and missed the actual founding Head of Product because she was a 3rd+ connection — LinkedIn's relevance ranking buried her behind 2nd-degree connections at unrelated companies.

A single structured query with current_company="103334640" would have returned all employees at that specific company, regardless of connection degree. This is exactly what LinkedIn's UI does when you click "Current company" in the filters panel.

LinkedIn URL parameter mapping

Parameter URL param Example
current_company currentCompany ["103334640"]
past_company pastCompany ["103334640"]
school schoolFilter ["1790"]
title titleFreeText product%20manager
network network ["F","S"]
industry industry ["14"]

Implementation notes

  • Follows the exact same pattern as _build_job_search_url() / search_jobs — filter maps, _normalize_csv(), URL parameter building
  • _format_bracket_list() converts comma-separated IDs to LinkedIn's ["id1","id2"] bracket syntax
  • _NETWORK_MAP maps human-readable "first", "second", "third" to LinkedIn's "F", "S", "O" codes
  • All new parameters are optional — fully backward compatible, existing behavior unchanged when omitted
  • Docstrings include guidance for LLMs on where to find company/school IDs (from prior get_company_profile or get_person_profile calls)
  • Linter passes (ruff check)

Test plan

  • _format_bracket_list("103334640,162479") produces ["103334640","162479"]
  • _build_people_search_url(keywords="product", current_company="103334640", title="head", network="first,second") produces correct URL with encoded filters
  • ruff check passes on both changed files
  • Import verification succeeds
  • Live test: search_people(keywords="", current_company="103334640") returns employees at the target company

🤖 Generated with Claude Code

stickerdaniel and others added 30 commits March 4, 2026 19:11
- Add unknown_sections to tool docstrings in person.py and company.py
- Add integration tests for unknown section names in both tools
- Document Greptile review endpoints in AGENTS.md
Patch _extract_overlay in test_posts_visits_recent_activity
for consistency with other TestScrapePersonUrls tests.
…r_scraping_replace_flag_enums_with_config_dicts

refactor(scraping): replace Flag enums with config dicts
…sync-tools-177

docs: sync manifest.json tools and features with current capabilities
…-file-maintenance

chore(deps): lock file maintenance
Lock file already has 3.1.0 since stickerdaniel#166; align pyproject.toml
floor to prevent accidental downgrades to v2.

Resolves: stickerdaniel#190
Lock file already has 3.1.0 since stickerdaniel#166; align pyproject.toml
floor to prevent accidental downgrades to v2.

Resolves: stickerdaniel#190

<!-- greptile_comment -->

<h3>Greptile Summary</h3>

This PR tightens the `fastmcp` minimum version constraint from `>=2.14.0` to `>=3.0.0` in `pyproject.toml` (and the corresponding `uv.lock` metadata), preventing any future resolver from backtracking to the incompatible v2 series. The lock file has already been pinning `fastmcp==3.1.0` since PR stickerdaniel#166, so there is no runtime impact — this is purely a spec/metadata alignment.

- `pyproject.toml`: `fastmcp` floor raised to `>=3.0.0`
- `uv.lock`: `package.metadata.requires-dist` updated to match; the resolved package entry (`3.1.0`) is unchanged
- No upper-bound cap (`<4.0.0`) is set, which is consistent with the project's existing open-ended constraints for all other dependencies

<h3>Confidence Score: 5/5</h3>

- This PR is safe to merge — it is a pure metadata alignment with no functional or runtime impact.
- The locked version was already `3.1.0` before this PR; the only change is raising the declared floor to match. Both modified lines are trivially correct, consistent with each other, and have no side-effects on the installed environment.
- No files require special attention.

<h3>Important Files Changed</h3>




| Filename | Overview |
|----------|----------|
| pyproject.toml | Single-line change updating the `fastmcp` floor constraint from `>=2.14.0` to `>=3.0.0`, aligning with the already-resolved version in the lock file. |
| uv.lock | Auto-generated lock file metadata updated to reflect the new `>=3.0.0` specifier; the resolved `fastmcp` version (3.1.0) was already correct and unchanged. |

</details>



<h3>Flowchart</h3>

```mermaid
%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["pyproject.toml\nfastmcp >=3.0.0"] -->|uv resolves| B["uv.lock\nfastmcp 3.1.0 (pinned)"]
    B --> C["Installed environment\nfastmcp 3.1.0"]
    D["Old constraint\nfastmcp >=2.14.0"] -. "could resolve to" .-> E["fastmcp 2.x\n(incompatible)"]
    style D fill:#f9d0d0,stroke:#c00
    style E fill:#f9d0d0,stroke:#c00
    style A fill:#d0f0d0,stroke:stickerdaniel#60
    style B fill:#d0f0d0,stroke:stickerdaniel#60
    style C fill:#d0f0d0,stroke:stickerdaniel#60
```

<sub>Last reviewed commit: 7d2363e</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
Replace dict-returning handle_tool_error() with raise_tool_error()
that raises FastMCP ToolError for known exceptions. Unknown exceptions
re-raise as-is for mask_error_details=True to handle.

Resolves: stickerdaniel#185
Add logger.error with exc_info for unknown exceptions before re-raising,
and add test coverage for AuthenticationError and ElementNotFoundError.
Re-add optional context parameter to raise_tool_error() for log
correlation, and add test for base LinkedInScraperException branch.
Add catch-all comment on base exception branch and NoReturn
inline comments on all raise_tool_error() call sites.
…eps_bump_fastmcp_constraint_to_3.0.0

refactor(error-handler): replace handle_tool_error with ToolError
Replace repeated ensure_authenticated/get_or_create_browser/
LinkedInExtractor boilerplate in all 6 tool functions with
FastMCP Depends()-based dependency injection via a single
get_extractor() factory in dependencies.py.

Resolves: stickerdaniel#186
Updated the get_extractor function to route errors through raise_tool_error, ensuring that MCP clients receive structured ToolError responses for authentication failures. Added a test to verify that authentication errors are correctly handled and produce the expected ToolError response.
…r_tools_use_depends_to_inject_extractor

refactor(tools): Use Depends() to inject extractor
Replace ToolAnnotations(...) with plain dicts, move title to
top-level @mcp.tool() param, and add category tags to all tools.

Resolves: stickerdaniel#189
…ickerdaniel#198)

Replace ToolAnnotations(...) with plain dicts, move title to
top-level @mcp.tool() param, and add category tags to all tools.

Resolves: stickerdaniel#189

<!-- greptile_comment -->

<h3>Greptile Summary</h3>

This PR is a clean, well-scoped refactoring that modernises tool metadata across all four changed files to align with the FastMCP 3.x API. It introduces no functional or behavioural changes.

Key changes:
- Removes the `ToolAnnotations(...)` Pydantic wrapper in `company.py`, `job.py`, and `person.py`, replacing it with plain `dict` syntax for the `annotations` parameter — the simpler form supported by FastMCP 3.x.
- Moves `title` from inside `ToolAnnotations` to a top-level keyword argument on `@mcp.tool()`, matching the updated FastMCP 3.x decorator signature.
- Drops the now-redundant `destructiveHint=False` from all read-only tools. Per the MCP spec, `destructiveHint` is only meaningful when `readOnlyHint` is `false`, so omitting it from tools that already declare `readOnlyHint=True` is semantically equivalent.
- Adds `tags` (as Python `set` literals) to every tool for categorisation (`"company"`, `"job"`, `"person"`, `"scraping"`, `"search"`, `"session"`).
- Enriches the previously unannotated `close_session` tool in `server.py` with a title, `destructiveHint=True`, and the `"session"` tag — accurately describing its destructive nature.

The existing test suite in `tests/test_tools.py` covers all tool functions but does not assert on annotation metadata, so no test changes are required. The refactoring is consistent across all tool files and fits naturally within the project's layered registration pattern.

<h3>Confidence Score: 5/5</h3>

- This PR is safe to merge — it is a pure metadata/annotation refactoring with no changes to tool logic, inputs, outputs, or error handling.
- All changes are limited to decorator parameters (`title`, `annotations`, `tags`). The `annotations` dict values are semantically equivalent to the removed `ToolAnnotations` objects, `destructiveHint=False` is correctly dropped only for `readOnlyHint=True` tools, and the new `close_session` annotations accurately reflect its destructive nature. No business logic, scraping behaviour, or error paths were altered.
- No files require special attention.

<h3>Flowchart</h3>

```mermaid
%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["@mcp.tool() decorator"] --> B{Annotation style}
    B -->|Before| C["ToolAnnotations(title=..., readOnlyHint=..., destructiveHint=False, openWorldHint=...)"]
    B -->|After| D["title='...' (top-level param)\nannotations={'readOnlyHint': True, 'openWorldHint': True}\ntags={'category', 'type'}"]
    D --> E["person tools\n(get_person_profile, search_people)"]
    D --> F["company tools\n(get_company_profile, get_company_posts)"]
    D --> G["job tools\n(get_job_details, search_jobs)"]
    D --> H["session tool\n(close_session)\nannotations={'destructiveHint': True}"]
```

<sub>Last reviewed commit: c5bf554</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
Use lowercase dict instead of Dict, add auth validation log line
…r_server_split_lifespan_into_composable_browser_auth_lifespans

refactor(server): Split lifespan into composable browser + auth lifespans
# Conflicts:
#	linkedin_mcp_server/server.py
#	linkedin_mcp_server/tools/company.py
#	linkedin_mcp_server/tools/job.py
#	linkedin_mcp_server/tools/person.py
# Conflicts:
#	linkedin_mcp_server/server.py
#	linkedin_mcp_server/tools/company.py
#	linkedin_mcp_server/tools/job.py
#	linkedin_mcp_server/tools/person.py
github-actions Bot and others added 26 commits March 29, 2026 09:36
…file

- Add get_sidebar_profiles() extractor method that scrapes sidebar sections
  (More profiles for you, Explore premium profiles, People you may know),
  follows Show all links, and skips any /premium redirects
- Add _extract_profile_urn() helper that reads the recipient URN from the
  Message button compose href on the current profile page
- Expose profile_urn in scrape_person results when available
- Register get_sidebar_profiles MCP tool in person.py
- Add to manifest.json and README tool table
- Tests: TestGetSidebarProfiles, TestExtractProfileUrn, TestScrapePersonProfileUrn,
  TestGetSidebarProfilesTool
Adds four messaging tools: get_inbox, get_conversation, search_conversations,
and send_message (with profile_urn bypass for reliable compose URL routing).
Includes all browser helper methods and full test coverage.
…nnect-new

feat: add get_sidebar_profiles tool and profile_urn in get_person_profile
…ependencies

chore(deps): update ci dependencies
- Replace custom _secure_profile_dirs/_set_private_mode with thin
  _harden_linkedin_tree that uses secure_mkdir from common_utils
- Fix export_storage_state: chmod 0o600 after Playwright writes
- Add test for export_storage_state permission hardening
- Add test for no-op outside .linkedin-mcp tree
- Revert unrelated loaders.py change
…e-profile-perms

Harden .linkedin-mcp profile/cookie permissions
- Remove unused selector constants (_MESSAGING_THREAD_LINK_SELECTOR, _MESSAGING_RESULT_ITEM_SELECTOR, _MESSAGING_SEND_SELECTOR)
- Remove dead _conversation_thread_cache (new extractor per tool call)
- Add AuthenticationError handling to get_sidebar_profiles and all messaging tools
- Pass CSS selector as evaluate() arg instead of f-string interpolation
- Replace deprecated execCommand with press_sequentially
- Guard sidebar container walk against depth-limit exhaustion
- Update scrape_person docstring to document profile_urn return key
- Add messaging tools to README tool-status table
LinkedIn redirects /messaging/ to the most recent thread; capture
baseline_thread_id after the SPA settles so search-selected threads
can be distinguished from the auto-opened one.
…connect

feat: linkedin messaging, get sidebar profiles
…IDs (stickerdaniel#300)

* fix(scraping): Respect --timeout for messaging, recognize thread URLs

Remove all hardcoded timeout=5000 from the send_message flow and
messaging helpers so they fall through to the page-level default
set from BrowserConfig.default_timeout (configurable via --timeout).

Also add /messaging/thread/ URL recognition to classify_link so
conversation thread references are captured when they appear in
search results or conversation detail views. Raise inbox reference
cap to 30 and add proper section context labels.

Resolves: stickerdaniel#296
See also: stickerdaniel#297

* fix(scraping): Extract conversation thread IDs from inbox via click-and-capture

LinkedIn's conversation sidebar uses JS click handlers instead of <a>
tags, so anchor extraction cannot capture thread IDs. Click each
conversation item and read the resulting SPA URL change to build
conversation references with thread_id and participant name.

Before: get_inbox returned 2 references (active conversation only)
After: get_inbox returns all conversation thread IDs (10+ refs)

Resolves: stickerdaniel#297

* fix(scraping): Respect --timeout across all remaining scraping methods

Remove the remaining 10 hardcoded timeout=5000 from profile scraping,
connection flow, modal detection, sidebar profiles, conversation
resolution, and job search. All Playwright calls now use the page-level
default from BrowserConfig.default_timeout.

Resolves: stickerdaniel#299

* fix: Address PR review feedback

- Use saved inbox URL instead of self._page.url (P1: wrong URL after clicks)
- Fix docstring to clarify 2s recipient-picker probe is intentional
- Replace class-name selectors with aria-label discovery + minimal class fallback
- Dedupe references after merging conversation and anchor refs
…erdaniel#303)

First-time uvx runs download ~77 Python packages including the 39MB
patchright wheel. On slow connections, uv's default 30s HTTP timeout
can cause silent failures before the server process starts.

Co-authored-by: Daniel Sticker <sticker@ngenn.net>
Move UV_HTTP_TIMEOUT=300 into the main uvx config example so it's the
default, not an optional troubleshooting step. Fix grammar in the
troubleshooting note.

Co-authored-by: Daniel Sticker <sticker@ngenn.net>
* docs: use @latest tag in uvx config for auto-updates

Without @latest, uvx caches the first downloaded version forever.
Adding @latest ensures uvx checks PyPI on each client launch and
pulls new versions automatically.

* docs: apply @latest consistently to all uvx invocations

Update --login examples in README.md and docs/docker-hub.md to use
linkedin-scraper-mcp@latest for consistency with the MCP config.

---------

Co-authored-by: Daniel Sticker <sticker@ngenn.net>
…, network, etc.)

Add 6 optional filter parameters to search_people matching LinkedIn's
URL-based people search filters: current_company, past_company, school,
title, network, and industry. Follows the same pattern already used by
_build_job_search_url() for search_jobs.

This makes people search significantly more reliable for LLM workflows
where keyword-only search returns noisy results biased by connection
proximity.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 4, 2026

Greptile Summary

This PR adds 6 optional structured filter parameters (current_company, past_company, school, title, network, industry) to search_people, directly mirroring the established _build_job_search_url / search_jobs pattern. All new parameters are optional with None defaults, making the change fully backward compatible with existing callers.

Confidence Score: 5/5

Safe to merge — all changes are additive and backward compatible with no breaking changes to existing behavior.

The implementation closely follows the established _build_job_search_url pattern, all new parameters are optional with None defaults, and the only finding is a minor edge-case robustness issue in _format_bracket_list that is unlikely to affect real LLM usage.

No files require special attention.

Important Files Changed

Filename Overview
linkedin_mcp_server/scraping/extractor.py Adds _NETWORK_MAP, _format_bracket_list, and _build_people_search_url following the existing _build_job_search_url pattern; cleanly delegates to helper functions.
linkedin_mcp_server/tools/person.py Adds 6 optional filter params to the search_people MCP tool and correctly forwards them to the extractor.

Sequence Diagram

sequenceDiagram
    participant LLM as LLM/Client
    participant Tool as search_people (person.py)
    participant Extractor as LinkedInExtractor
    participant Builder as _build_people_search_url
    participant Helpers as _format_bracket_list / _normalize_csv
    participant LinkedIn as LinkedIn Search

    LLM->>Tool: search_people(keywords, current_company, network, ...)
    Tool->>Extractor: extractor.search_people(keywords, current_company, network, ...)
    Extractor->>Builder: _build_people_search_url(keywords, current_company, network, ...)
    Builder->>Helpers: _format_bracket_list(current_company)
    Helpers-->>Builder: ["103334640"]
    Builder->>Helpers: _normalize_csv(network, _NETWORK_MAP) + _format_bracket_list
    Helpers-->>Builder: ["F","S"]
    Builder-->>Extractor: https://linkedin.com/search/results/people/?keywords=...
    Extractor->>LinkedIn: GET constructed URL
    LinkedIn-->>Extractor: search results page
    Extractor-->>Tool: {url, sections, references}
    Tool-->>LLM: {url, sections, references}
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/extractor.py
Line: 153-155

Comment:
**Whitespace-only inputs produce malformed bracket entries**

If a caller passes a whitespace-only string (e.g. `"  "`), `if current_company:` is truthy, so `_format_bracket_list` is invoked and produces `[""]` — an invalid LinkedIn ID that would silently corrupt the filter. Filtering empty parts after stripping fixes this.

```suggestion
    parts = [v.strip() for v in value.split(",") if v.strip()]
    inner = ",".join('"' + p + '"' for p in parts)
    return "[" + inner + "]"
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "feat: add structured filters to search_p..." | Re-trigger Greptile

Comment on lines +153 to +155
parts = [v.strip() for v in value.split(",")]
inner = ",".join('"' + p + '"' for p in parts)
return "[" + inner + "]"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Whitespace-only inputs produce malformed bracket entries

If a caller passes a whitespace-only string (e.g. " "), if current_company: is truthy, so _format_bracket_list is invoked and produces [""] — an invalid LinkedIn ID that would silently corrupt the filter. Filtering empty parts after stripping fixes this.

Suggested change
parts = [v.strip() for v in value.split(",")]
inner = ",".join('"' + p + '"' for p in parts)
return "[" + inner + "]"
parts = [v.strip() for v in value.split(",") if v.strip()]
inner = ",".join('"' + p + '"' for p in parts)
return "[" + inner + "]"
Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/extractor.py
Line: 153-155

Comment:
**Whitespace-only inputs produce malformed bracket entries**

If a caller passes a whitespace-only string (e.g. `"  "`), `if current_company:` is truthy, so `_format_bracket_list` is invoked and produces `[""]` — an invalid LinkedIn ID that would silently corrupt the filter. Filtering empty parts after stripping fixes this.

```suggestion
    parts = [v.strip() for v in value.split(",") if v.strip()]
    inner = ",".join('"' + p + '"' for p in parts)
    return "[" + inner + "]"
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants